Rminer: An integrated model for repository mining using Rascal

نویسنده

  • Jurgen Vinju
چکیده

In this thesis the feasibility of an integrated model for repository mining using Rascal is examined. To this end the SCM data sources CVS, SVN and Git are integrated into one integrated repository model named Rminer, which can be used in the data extraction and analysis phases of MSR research. First the requirements for Rminer are analysed by examining the various MSR research being done and analysing the commonalities and differences between the three SCM systems to be supported. After the design and implementation of the tool, a case study is performed to evaluate the feasibility of an integrated model for repository mining in Rascal. We found out that while it is possible to create an integrated repository model, by unifying the commonalities and making the differences between the SCM systems explicit, MSR research still needs to be careful when using the unified data in the model, due to the possible differences in the semantics of the SCM systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Mining with Neural Networks and Support Vector Machines Using the R/rminer Tool

We present rminer, our open source library for the R tool that facilitates the use of data mining (DM) algorithms, such as neural Networks (NNs) and support vector machines (SVMs), in classification and regression tasks. Tutorial examples with real-world problems (i.e. satellite image analysis and prediction of car prices) were used to demonstrate the rminer capabilities and NN/SVM advantages. ...

متن کامل

An Integrated DEA and Data Mining Approach for Performance Assessment

This paper presents a data envelopment analysis (DEA) model combined with Bootstrapping to assess performance of one of the Data mining Algorithms. We applied a two-step process for performance productivity analysis of insurance branches within a case study. First, using a DEA model, the study analyzes the productivity of eighteen decision-making units (DMUs). Using a Malmquist index, DEA deter...

متن کامل

Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers

BACKGROUND The prediction of sites and products of metabolism in xenobiotic compounds is key to the development of new chemical entities, where screening potential metabolites for toxicity or unwanted side-effects is of crucial importance. In this work 2D topological fingerprints are used to encode atomic sites and three probabilistic machine learning methods are applied: Parzen-Rosenblatt Wind...

متن کامل

Recommending Library Methods: An Evaluation of the Vector Space Model (VSM) and Latent Semantic Indexing (LSI)

The development and maintenance of a reuse repository requires significant investment, planning and managerial support. To minimise risk and ensure a healthy return on investment, reusable components should be accessible, reliable and of a high quality. In this paper we concentrate on accessability; we describe a technique which enables a developer to effectively and conveniently make use of la...

متن کامل

RASCAL: A Recommender Agent for Software Components in an Agile Environment

As software organisations mature, their repository of reusable software components from previous projects will grow considerably. Remaining conversant with all components in such a repository presents a significant challenge to developers. Indeed the retrieval of a particular component in this large search space may prove problematic. We propose to infer the need for a component and proactively...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010